Control Hopper Env, Stand + Hop #440

Xander-Hinrichsen · 2024-07-18T11:37:45Z

Added hopper xml, hopper environment, hopper stand and hop tasks
MS-HopperStand-v1 & MS-HopperHop-v1
Made small changes to the following files
mani_skill/envs/sapien_env.py - added ability to add articulation mount for render only camera - this allows the ppo evaluation videos to be mounted to robot articulation (necessary to view hopper throughout episode and it's movement) - doesn't make any difference for _human_render_cameras if entity_uid in camera config isn't specified - as usual for all previous environments
mani_skill/utils/building/ground.py - added additional kwargs to ground builder to allow rectangular grounds (no longer just square) and ground xy offsets (both necessary for dm_control planar agent ground visual in mujoco planar agent environments style) - no difference for this api if these additional kwargs aren't specified - as usual for all previous environments
index - task documentation
Made large changes to the following files
mani_skill/envs/utils/rewards/common.py - value at margin wasn't implemented, and the sigmoidal reward fns weren't guaranteed to return rewards between 0 and 1! - made changes s.t. it's equivalent to dm_control reward tolerance

These changes make it much easier for implementation of other planar agents (like walker and cheetah), but in my experience, there are issues (like known and documented bugs in mjcf_loader.py and joint tuning difficulty) that make these other environments harder to implement

environment requires further tuning to more resemble dm_control hopper

5.mp4

2.mp4

args to recreate:
Hop
for i in {1..3}; do python ppo.py --exp_name="hopseed${i}" --env_id="MS-HopperHop-v1" --num_envs=2048 --update_epochs=8 --num_minibatches=32 --total_timesteps=20_000_000 --eval_freq=10 --num_eval_steps=100 --num_steps=100 --gamma=0.999 --seed=${i}; done
Stand
for i in {1..3}; do python ppo.py --exp_name="standseed${i}" --env_id="MS-HopperStand-v1" --num_envs=2048 --update_epochs=8 --num_minibatches=32 --total_timesteps=20_000_000 --eval_freq=10 --num_eval_steps=100 --num_steps=100 --gamma=0.99 --seed=${i}; done
HopRGB singleseed
python ppo_rgb.py --exp_name="rgbhopseed1" --env_id="MS-HopperHop-v1" --num_envs=1024 --update_epochs=8 --num_minibatches=32 --total_timesteps=20_000_000 --eval_freq=10 --num_eval_steps=100 --num_steps=100 --gamma=0.99 --seed=1

StoneT2000 · 2024-07-18T22:00:06Z

mani_skill/envs/tasks/control/hopper.py

+    # dm_control also includes foot pressures as state obs space
+    def _get_obs_extra(self, info: Dict):
+        obs = dict(
+            toe_touch=self.touch("foot_toe"),


does dm control with visual data input ever use state information? This data will be part of the observation (along with _get_obs_agent())

I am now explicitly overriding _get_obs_state_dict to include entire state (touch + proprioception), so the touch information isn't allowed for rgbd obs mode

dm_control uses pixel wrapper, which allows for user to choose either entire state + pixels or only pixels

Ok looking at what seems to be standard practice, DrQ and TDMPC both do not use state at all in training, e.g. https://github.com/nicklashansen/tdmpc2/blob/5f6fadec0fec78304b4b53e8171d348b58cac486/tdmpc2/envs/wrappers/pixels.py#L8

Given the tasks are not that hard, we can do image only ppo as well. The way I will do it is to fix the flatten rgbd obsevation wrapper to be like the pixel wrapper, and you just choose if you want rgb, depth, and / or state, and it will handle it accordingly.

re this, merge the main branch into your PR. I have updated code to do this exactly. The new ppo rgb script is this:

python ppo_rgb.py --exp_name="rgbhopseed1" --env_id="MS-HopperHop-v1" --num_envs=512 --update_epochs=8 --num_minibatches=32 --total_timesteps=20_000_000 --eval_freq=10 --num_eval_steps=600 --num_steps=50 --gamma=0.99 --seed=1 --no_include_state

(the hyperparams might not be well tuned).

StoneT2000 · 2024-07-18T22:07:09Z

mani_skill/envs/tasks/control/hopper.py

+                found_lost_pairs_capacity=2**25, max_rigid_patch_count=2**18
+            ),
+            scene_cfg=SceneConfig(
+                solver_position_iterations=15, solver_velocity_iterations=15


why these configs for solver? There is no need to try and match mujoco solver configurations since mujoco is a different physics engine compared to what we use which is physx

changed to solver_position_iteration from default 15 to 4, for sim efficiency (15 was unnecessary), and vel iterations stays default value of 1

StoneT2000 · 2024-07-18T22:07:24Z

mani_skill/envs/tasks/control/hopper.py

+                solver_position_iterations=15, solver_velocity_iterations=15
+            ),
+            # sim_freq=50,
+            control_freq=25,


how is control_freq here decided, was it to try and match dm-control?

control_freq was set so the ratio of sim_freq to control is 4:1 (same as dm_control hopper), code now explicitly has sim_freq=100

mani_skill/envs/sapien_env.py

mani_skill/envs/tasks/control/hopper.py

StoneT2000 · 2024-07-18T22:24:59Z

mani_skill/envs/tasks/control/hopper.py

+                uid="cam0",
+                pose=Pose.create_from_pq([0, -2.8, 0], euler2quat(0, 0, np.pi / 2)),
+                width=128,
+                height=128,


is 128x128 correct? I recall dm-control uses smaller images

i found that images are 240 height, 320 width:

import dm_control.suite as suite from dm_control.suite.wrappers import pixels import numpy as np env = suite.load("hopper", "hop") env_and_pixels = pixels.Wrapper(env, pixels_only=True) _, _, _, obs = env_and_pixels.reset() print(obs['pixels'].shape) # prints (240, 320, 3)

this seems large, should we use the same dims?

Actually turns out people seem to use different resolutions. Let's just stick with 128x128 for now. It runs fast and is sufficient.

Just checked TDMPC2 code and they use 384x384. So not sure how this is picked usually.

StoneT2000 · 2024-07-18T22:28:10Z

mani_skill/envs/tasks/control/hopper.py

+        for a in actor_builders:
+            a.build(a.name)
+
+        # load in the ground


for hopper (and also cartpole). I recommend we add a massive long white wall behind so that the rendered videos look a bit nicer with color 0.3, 0.3, 0.3 (this will match what ray-traced videos will look like since they include the ambient lighting which is 0.3 0.3 0.3). Moreover with a big background wall if anyone wants to use depth image data they can and it won't be full of 0s (indicating no depth info since it goes to infinity).

created new scene builder in mani_skill/utils/scene_builder/control/planar, for re-use for the other planar agents
new eval visual (512x512):

6.mp4

added wall to cartpole.py file directly, since it doesn't fit with planar scene builder or warrant its own scene builder

2.mp4

ok we might want to play around with lighting in the future to make these look a bit brighter, but no need for now.

Xander-Hinrichsen · 2024-07-23T22:04:24Z

modified xml file hip range because original hopper robot is bugged - its hip range allows self intersection, getting the robot stuck in a certain position, with small probability on initialization:

StoneT2000 · 2024-07-23T23:24:29Z

Is the ppo_rgb.py code updated? Seem to run into a few errors due to evaluating 100 steps instead of the max 600.

Xander-Hinrichsen · 2024-07-23T23:48:09Z

Is the ppo_rgb.py code updated? Seem to run into a few errors due to evaluating 100 steps instead of the max 600.

yes, I added ppo_rgb 1 line change to get it to work in new commit - there was no check that success exists in final_info dict in ppo_rgb.py when tensorboard writing, while there is a check in ppo.py

for i in {1..3}; do python ppo_rgb.py --exp_name="pixels_hopseed${i}" --env_id="MS-HopperHop-v1" --num_envs=256 --update_epochs=8 --num_minibatches=8 --total_timesteps=25_000_000 --num-steps=100 --num_eval_steps=600 --gamma=0.995 --seed=${i}; done

performance is worse without the pressure sensor information

StoneT2000 · 2024-07-23T23:54:07Z

It is hard to tell if it's "worse" since this is a different return scale now? Also i tried running again but the performance after 5M steps looks like this:

9.mp4

Looks like it learns to flip forward to go fast now, instead of the original behavior.

StoneT2000 · 2024-07-23T23:56:25Z

My script:

python ppo_rgb.py --exp_name="rgbhopseed1" --env_id="MS-HopperHop-v1" --num_envs=512 --update_epochs=8 --num_minibatches=32 --total_timesteps=20_000_000 --eval_freq=15 --num_eval_steps=600 --num_steps=100 --gamma=0.99 --seed=1 --no-include-state

(merge main for this to work)
This seems to perform better than the script you ran with these hyperparams. It does show some sign that it might learn the "right" behavior (it could also be that our hopper setup affords more optimal flipping than mujocos)

Maybe behavior will be removed if the agent does not have so much power behind its actions. I can try and experiment as well.

Xander-Hinrichsen · 2024-07-24T00:01:33Z

This has been my experience with the damping, stiffness, and delta being very important for desired behvior. If the damping especially is too low, then the hopper is unable to upright itself, if it's too high, then it can instead flip to its feet too easily.

StoneT2000 · 2024-07-24T00:27:20Z

can you somehow revert to the settings that got the first video that was working? or was it only working from state inputs

Xander-Hinrichsen · 2024-07-24T00:28:40Z

it was only working from state inputs, these are the same controller settings that produced the first videos

StoneT2000 · 2024-07-24T00:30:42Z

Arguably though with the flipping it seems like it achieves the same distance if not further than the standard "behavior". I would then say this is more of a result of the reward function. Since its not possible to get the same exact simulation as mujoco, maybe we can just penalize flipping (just penalize the x/y axis rotation velocity should be sufficient, whichever is the axis pointing to the camera). It's fine if our reward function is slightly different to mujocos.

I find many other repos tackling locomotion like isaac lab / orbit will often add reward terms not meant to make the robot run faster, but to have a smoother gait, which is what we will be doing here essentially.

Xander-Hinrichsen · 2024-07-24T00:40:48Z

okay, that sounds good, I will try some experiments with discouraging flipping reward

Xander-Hinrichsen · 2024-07-24T02:12:54Z

Maybe behavior will be removed if the agent does not have so much power behind its actions. I can try and experiment as well.

No need to try to experiment, I have found that y axis rotation bounds for reward works well, will run tests and report here later tonight

edit: update - It worked well for a quick check on state obs - to ensure no flipping with large deltas. However, I've tried many, many different things, and unfortunately, any additional reward doesn't fix the problem - in fact, I believe the flipping was originally a local min solution, where hopping would result in greater reward, since the flipping doesn't always activate the standing reward portion of the hopping reward

currently testing out non delta joint pos

StoneT2000 · 2024-07-24T15:23:24Z

Maybe behavior will be removed if the agent does not have so much power behind its actions. I can try and experiment as well.

No need to try to experiment, I have found that y axis rotation bounds for reward works well, will run tests and report here later tonight

edit: update - It worked well for a quick check on state obs - to ensure no flipping with large deltas. However, I've tried many, many different things, and unfortunately, any additional reward doesn't fix the problem - in fact, I believe the flipping was originally a local min solution, where hopping would result in greater reward, since the flipping doesn't always activate the standing reward portion of the hopping reward

currently testing out non delta joint pos

what do you mean non delta joint pos? all of them should use that controller

Xander-Hinrichsen · 2024-07-25T00:47:12Z

what do you mean non delta joint pos? all of them should use that controller
I tried pd_joint_pos instead of pd_joint_delta_pos is what I meant, didn't work though

I've diagnosed the problem:
Originally, I tried to implement the robot center of mass x velocity, but it didn't work, so I just used the xvelocity of the slider joint holding the hopper in the xz plane (see in current implementation of subtreelinvelx function).

Since the entire robot's center of mass x velocity isn't taken into consideration (and only the torso xvel which is attached to the slider), the robot isn't penalized for flipping in a large circle as long as its torso continues to move in the +x world frame direction

Here is original robot center of mass velocity function, which is slow and doesn't work

    @property  # dm_control mjc function adapted for maniskill
    def subtreelinvelx(self):
        """Returns linvel x component of robot"""
        total_mass = 0.0
        vels = []
        links = self.agent.robot.get_links()
        # skip first three dummy links
        for i in range(3,len(links)):
            vel = links[i].get_linear_velocity() # vel is (b,3)
            mass = links[i].get_mass()[0]
            vels.append(vel * mass.item())
            total_mass += mass.item()
        vels = torch.stack(vels, dim=0) # (#links, b, 3)
        com_vel = vels.sum(dim=0) / total_mass # (b, 3)
        return com_vel[:,0]

it seems like the get_linear_velocity() function doesn't return the center of mass linear velocity of the links?, which is causing the issue.

If calculating the center of mass velocity for an entire robot is non-trivial, this is an issue for hopper + all planar + all humanoid dm_control task implementations, since they com velocity as reward

StoneT2000 · 2024-07-25T01:34:19Z

Lemme get back to you. I think I also just found a bug with the GPU sim linear and angular velocities. CPU sim is correct but on GPU sim they might flipped. Working with @fbxiang now to debug it.

StoneT2000 · 2024-07-25T02:13:24Z

Actually you can use the torso link. It should work. That is the link they are reading the center of mass velocities from. All velocity data is center of mass actually.

e.g.

link = self.agent.robot.links_map["torso"]
print(link.linear_velocity, link.angular_velocity)

Also merge in main before doing more experiments. I just fixed #450 merged in #452 , so the link velocities used or com velocities are wrong lol

Thinking also more, maybe it is also better to just penalize the y-axis rotation, instead of velocity (its okay to go fast, just don't flip over).

StoneT2000

small change requested i didnt catch earlier. See other comments above

mani_skill/utils/scene_builder/control/planar/scene_builder.py

Xander-Hinrichsen · 2024-07-26T01:42:26Z

**The results have large variance because in both hop and stand tasks, in seed 2, ppo learns to straighten its legs out too early, and doesn't explore tucking them in when hopper is on its belly, in order to get up - ppo then gets stuck at this local optimum in seed 2 - This is a a result of lack of exploration in ppo, not a fault of the environment, as shown below

Standing results and command:

7.mp4

for i in {1..3}; do python ppo.py --exp_name="standseed${i}" --env_id="MS-HopperStand-v1" --num_envs=4096 --update_epochs=8 --num_minibatches=32 --total_timesteps=30_000_000 --eval_freq=10 --num_eval_steps=600 --num_steps=100 --gamma=0.99 --ent_coef=0.01 --seed=${i}; done

Hopping results and command:

12.mp4

for i in {1..3}; do python ppo.py --exp_name="hopseed${i}" --env_id="MS-HopperHop-v1" --num_envs=4096 --update_epochs=8 --num_minibatches=32 --total_timesteps=60_000_000 --eval_freq=10 --num_eval_steps=600 --num_steps=100 --gamma=0.99 --ent_coef=0.01 --seed=${i}; done

RGB Hop results and command:

52.mp4

python ppo_rgb.py --exp_name="pixelhop" --env_id="MS-HopperHop-v1" --num_envs=512 --update_epochs=8 --num_minibatches=32 --total_timesteps=40_000_000 --eval_freq=15 --num_eval_steps=600 --num_steps=100 --gamma=0.99 --seed=1 --no-include-state

Local Optima in PPO training

seed 2 ppo local optimum:
ppo learns to straighten its legs out no matter what, and doesn't overcome this local optimum during training in seed 2

7.mp4

seed 1 and 3 don't have this issue:

seed 1 stand:

7.mp4

seed 3 stand:

7.mp4

**and even if the hopper learns to flip, it is a local minimum as it will eventually learn that hopping results in more rewards than flipping, this is because the center of mass of entire robot body reward correctly rewards hopping more than flipping

seed 3 of hopping command above first learns to flip, then later learns better how to hop to maximize the reward, eventually forgetting the flipping behavior:**
remnant of flipping behavior shown during middle of training:

9.mp4

Xander-Hinrichsen and others added 9 commits July 12, 2024 21:19

hopper added

7cf8c3c

Merge branch 'haosulab:main' into main

53479d8

Merge branch 'haosulab:main' into main

c73c1be

Merge branch 'haosulab:main' into main

e5bb4c5

hoppper-v1

b582deb

camera mount

814c1b9

formatting

d2a1f17

formatting

4ce2293

sigmoidal rewards comment removed

03060c7

Xander-Hinrichsen requested a review from StoneT2000 July 18, 2024 11:48

StoneT2000 requested changes Jul 18, 2024

View reviewed changes

Xander-Hinrichsen and others added 3 commits July 23, 2024 03:34

Merge branch 'haosulab:main' into main

e16295d

design fixes, visuals and convention

2da3604

added cartpole wall

d4f0668

Xander-Hinrichsen requested a review from StoneT2000 July 23, 2024 20:37

allow no info success ppo_rgb.py

763ce6b

Merge branch 'haosulab:main' into main

2e04179

StoneT2000 requested changes Jul 25, 2024

View reviewed changes

mani_skill/utils/scene_builder/control/planar/scene_builder.py Outdated Show resolved Hide resolved

Xander-Hinrichsen and others added 4 commits July 24, 2024 20:24

Merge branch 'haosulab:main' into main

0ba8dab

Merge branch 'haosulab:main' into main

cbad1d6

hopping reward corrected

2ae6a47

standing reward fixed

a75855d

Xander-Hinrichsen requested a review from StoneT2000 July 26, 2024 01:51

StoneT2000 approved these changes Jul 26, 2024

View reviewed changes

StoneT2000 merged commit ebf0282 into haosulab:main Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Control Hopper Env, Stand + Hop #440

Control Hopper Env, Stand + Hop #440

Xander-Hinrichsen commented Jul 18, 2024 •

edited

Loading

StoneT2000 Jul 18, 2024

Xander-Hinrichsen Jul 23, 2024

Xander-Hinrichsen Jul 23, 2024

StoneT2000 Jul 23, 2024

StoneT2000 Jul 23, 2024

StoneT2000 Jul 18, 2024

Xander-Hinrichsen Jul 23, 2024 •

edited

Loading

StoneT2000 Jul 18, 2024

Xander-Hinrichsen Jul 23, 2024 •

edited

Loading

StoneT2000 Jul 18, 2024

Xander-Hinrichsen Jul 23, 2024

StoneT2000 Jul 23, 2024

StoneT2000 Jul 18, 2024

Xander-Hinrichsen Jul 23, 2024 •

edited

Loading

Xander-Hinrichsen Jul 23, 2024

StoneT2000 Jul 23, 2024

Xander-Hinrichsen commented Jul 23, 2024 •

edited

Loading

StoneT2000 commented Jul 23, 2024

Xander-Hinrichsen commented Jul 23, 2024 •

edited

Loading

StoneT2000 commented Jul 23, 2024 •

edited

Loading

StoneT2000 commented Jul 23, 2024 •

edited

Loading

Xander-Hinrichsen commented Jul 24, 2024

StoneT2000 commented Jul 24, 2024

Xander-Hinrichsen commented Jul 24, 2024 •

edited

Loading

StoneT2000 commented Jul 24, 2024 •

edited

Loading

Xander-Hinrichsen commented Jul 24, 2024

Xander-Hinrichsen commented Jul 24, 2024 •

edited

Loading

StoneT2000 commented Jul 24, 2024

Xander-Hinrichsen commented Jul 25, 2024 •

edited

Loading

StoneT2000 commented Jul 25, 2024

StoneT2000 commented Jul 25, 2024 •

edited

Loading

StoneT2000 left a comment

Xander-Hinrichsen commented Jul 26, 2024 •

edited

Loading

Control Hopper Env, Stand + Hop #440

Control Hopper Env, Stand + Hop #440

Conversation

Xander-Hinrichsen commented Jul 18, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xander-Hinrichsen Jul 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xander-Hinrichsen Jul 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xander-Hinrichsen Jul 23, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Xander-Hinrichsen commented Jul 23, 2024 • edited Loading

StoneT2000 commented Jul 23, 2024

Xander-Hinrichsen commented Jul 23, 2024 • edited Loading

StoneT2000 commented Jul 23, 2024 • edited Loading

StoneT2000 commented Jul 23, 2024 • edited Loading

Xander-Hinrichsen commented Jul 24, 2024

StoneT2000 commented Jul 24, 2024

Xander-Hinrichsen commented Jul 24, 2024 • edited Loading

StoneT2000 commented Jul 24, 2024 • edited Loading

Xander-Hinrichsen commented Jul 24, 2024

Xander-Hinrichsen commented Jul 24, 2024 • edited Loading

StoneT2000 commented Jul 24, 2024

Xander-Hinrichsen commented Jul 25, 2024 • edited Loading

StoneT2000 commented Jul 25, 2024

StoneT2000 commented Jul 25, 2024 • edited Loading

StoneT2000 left a comment

Choose a reason for hiding this comment

Xander-Hinrichsen commented Jul 26, 2024 • edited Loading

Standing results and command:

Hopping results and command:

RGB Hop results and command:

Local Optima in PPO training

Xander-Hinrichsen commented Jul 18, 2024 •

edited

Loading

Xander-Hinrichsen Jul 23, 2024 •

edited

Loading

Xander-Hinrichsen Jul 23, 2024 •

edited

Loading

Xander-Hinrichsen Jul 23, 2024 •

edited

Loading

Xander-Hinrichsen commented Jul 23, 2024 •

edited

Loading

Xander-Hinrichsen commented Jul 23, 2024 •

edited

Loading

StoneT2000 commented Jul 23, 2024 •

edited

Loading

StoneT2000 commented Jul 23, 2024 •

edited

Loading

Xander-Hinrichsen commented Jul 24, 2024 •

edited

Loading

StoneT2000 commented Jul 24, 2024 •

edited

Loading

Xander-Hinrichsen commented Jul 24, 2024 •

edited

Loading

Xander-Hinrichsen commented Jul 25, 2024 •

edited

Loading

StoneT2000 commented Jul 25, 2024 •

edited

Loading

Xander-Hinrichsen commented Jul 26, 2024 •

edited

Loading